AITopics | vanishing regret

Fast and Furious Learning in Zero-Sum Games: Vanishing Regret with Non-Vanishing Step Sizes

Neural Information Processing SystemsDec-25-2025, 05:41:49 GMT

We show for the first time that it is possible to reconcile in online learning in zero-sum games two seemingly contradictory objectives: vanishing time-average regret and non-vanishing step sizes. This phenomenon, that we coin ``fast and furious learning in games, sets a new benchmark about what is possible both in max-min optimization as well as in multi-agent systems. Our analysis does not depend on introducing a carefully tailored dynamic. Instead we focus on the most well studied online dynamic, gradient descent. Similarly, we focus on the simplest textbook class of games, two-agent two-strategy zero-sum games, such as Matching Pennies. Even for this simplest of benchmarks the best known bound for total regret, prior to our work, was the trivial one of $O(T)$, which is immediately applicable even to a non-learning agent. Based on a tight understanding of the geometry of the non-equilibrating trajectories in the dual space we prove a regret bound of $\Theta(\sqrt{T})$ matching the well known optimal bound for adaptive step sizes in the online setting. This guarantee holds for all fixed step-sizes without having to know the time horizon in advance and adapt the fixed step-size accordingly.As a corollary, we establish that even with fixed learning rates the time-average of mixed strategies, utilities converge to their exact Nash equilibrium values. We also provide experimental evidence suggesting the stronger regret bound holds for all zero-sum games.

fast and furious learning, vanishing regret, zero-sum game, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic Markov Decision Processes

Neural Information Processing SystemsDec-23-2025, 20:20:50 GMT

In this paper, a rather general online problem called \emph{dynamic resource allocation with capacity constraints (DRACC)} is introduced and studied in the realm of posted price mechanisms. This problem subsumes several applications of stateful pricing, including but not limited to posted prices for online job scheduling and matching over a dynamic bipartite graph. As the existing online learning techniques do not yield vanishing-regret mechanisms for this problem, we develop a novel online learning framework defined over deterministic Markov decision processes with \emph{dynamic} state transition and reward functions. We then prove that if the Markov decision process is guaranteed to admit an oracle that can simulate any given policy from any initial state with bounded loss --- a condition that is satisfied in the DRACC problem --- then the online learning problem can be solved with vanishing regret. Our proof technique is based on a reduction to online learning with \emph{switching cost}, in which an online decision maker incurs an extra cost every time she switches from one arm to another. We formally demonstrate this connection and further show how DRACC can be used in our proposed applications of stateful pricing.

dynamic deterministic markov decision process, name change, vanishing regret, (9 more...)

Neural Information Processing Systems

Industry: Education (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Reviews: Fast and Furious Learning in Zero-Sum Games: Vanishing Regret with Non-Vanishing Step Sizes

Neural Information Processing SystemsJan-22-2025, 20:01:45 GMT

This paper studies gradient descent with a fixed step size for 2x2 zero-sum games. The key theoretical result is that while in discrete time, FTRL only guarantees linear regret for a fixed step size, by utilizing the geometry in gradient descent, one may obtain sublinear regret in 2x2 zero-sum games. The authors show that after a long enough time, strategies will lie on the boundary of the strategy space. The main result (Theorem 2) is obtained by characterizing the exact dynamics for gradient descent, suitably selecting a dual space, decomposing each "rotation" into distinct components, analyzing the change in energy in each iteration for each component and showing that time is quadratic in the number of partitions (rotations). The authors give an example which satisfies a matching lower bound, showing that their analysis is tight.

fast and furious learning, non-vanishing step size, zero-sum game, (9 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.37)

Technology:

Information Technology > Game Theory (0.87)
Information Technology > Artificial Intelligence > Machine Learning (0.81)

Add feedback

Review for NeurIPS paper: Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic Markov Decision Processes

Neural Information Processing SystemsJan-22-2025, 10:24:38 GMT

Summary and Contributions: This paper considers a dynamic resource pricing problem. Agents arrive over time requesting resources, and the resources themselves become available and unavailable over time. The system sets (dynamic) prices on resources at each moment in time, and agents then choose the resources that maximize their utility given the prices. The agent requests are adversarial, and the goal is to select a pricing policy minimizes regret. The main contribution is a policy with vanishing regret for a very general formulation of such allocation problems.

dynamic deterministic markov decision process, initial state, vanishing regret, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Decision Support Systems (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Add feedback

Review for NeurIPS paper: Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic Markov Decision Processes

Neural Information Processing SystemsJan-22-2025, 10:24:31 GMT

The reviewers evaluated the paper and there was broad agreement that this is quality work. However, there was still some concerns about readability so we would urge you to revise the submission for the camera-ready.

dynamic deterministic markov decision process, neurips paper, vanishing regret, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Decision Support Systems (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Add feedback

Fast and Furious Learning in Zero-Sum Games: Vanishing Regret with Non-Vanishing Step Sizes

Neural Information Processing SystemsOct-9-2024, 19:44:28 GMT

We show for the first time that it is possible to reconcile in online learning in zero-sum games two seemingly contradictory objectives: vanishing time-average regret and non-vanishing step sizes. This phenomenon, that we coin fast and furious" learning in games, sets a new benchmark about what is possible both in max-min optimization as well as in multi-agent systems. Our analysis does not depend on introducing a carefully tailored dynamic. Instead we focus on the most well studied online dynamic, gradient descent. Similarly, we focus on the simplest textbook class of games, two-agent two-strategy zero-sum games, such as Matching Pennies. Even for this simplest of benchmarks the best known bound for total regret, prior to our work, was the trivial one of O(T), which is immediately applicable even to a non-learning agent.

fast and furious learning, non-vanishing step size, vanishing regret, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic Markov Decision Processes

Neural Information Processing SystemsOct-9-2024, 17:21:38 GMT

In this paper, a rather general online problem called \emph{dynamic resource allocation with capacity constraints (DRACC)} is introduced and studied in the realm of posted price mechanisms. This problem subsumes several applications of stateful pricing, including but not limited to posted prices for online job scheduling and matching over a dynamic bipartite graph. As the existing online learning techniques do not yield vanishing-regret mechanisms for this problem, we develop a novel online learning framework defined over deterministic Markov decision processes with \emph{dynamic} state transition and reward functions. We then prove that if the Markov decision process is guaranteed to admit an oracle that can simulate any given policy from any initial state with bounded loss --- a condition that is satisfied in the DRACC problem --- then the online learning problem can be solved with vanishing regret. Our proof technique is based on a reduction to online learning with \emph{switching cost}, in which an online decision maker incurs an extra cost every time she switches from one arm to another.

dynamic deterministic markov decision process, online, vanishing regret, (5 more...)

Neural Information Processing Systems

Industry: Education (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.89)

Add feedback

Fast and Furious Learning in Zero-Sum Games: Vanishing Regret with Non-Vanishing Step Sizes

Bailey, James, Piliouras, Georgios

Neural Information Processing SystemsMar-19-2020, 02:01:28 GMT

We show for the first time that it is possible to reconcile in online learning in zero-sum games two seemingly contradictory objectives: vanishing time-average regret and non-vanishing step sizes. This phenomenon, that we coin fast and furious" learning in games, sets a new benchmark about what is possible both in max-min optimization as well as in multi-agent systems. Our analysis does not depend on introducing a carefully tailored dynamic. Instead we focus on the most well studied online dynamic, gradient descent. Similarly, we focus on the simplest textbook class of games, two-agent two-strategy zero-sum games, such as Matching Pennies.

fast and furious learning, non-vanishing step size, vanishing regret, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.87)

Add feedback

Filters

Collaborating Authors

vanishing regret

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Fast and Furious Learning in Zero-Sum Games: Vanishing Regret with Non-Vanishing Step Sizes

Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic Markov Decision Processes

Reviews: Fast and Furious Learning in Zero-Sum Games: Vanishing Regret with Non-Vanishing Step Sizes

Review for NeurIPS paper: Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic Markov Decision Processes

Review for NeurIPS paper: Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic Markov Decision Processes

Fast and Furious Learning in Zero-Sum Games: Vanishing Regret with Non-Vanishing Step Sizes

Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic Markov Decision Processes

Fast and Furious Learning in Zero-Sum Games: Vanishing Regret with Non-Vanishing Step Sizes